Libraries
library(FactoMineR)
library(tidyr)
library(dplyr)
library(tidyverse)
library(magrittr)
library(ggplot2)
library(ggpubr)
library(factoextra)
library(gridExtra)
library(moments)
Screw Caps Data
raw_data <- read.table("ScrewCaps.csv",header=TRUE, sep=",", dec=".", row.names=1)
head(raw_data)
summary(raw_data)
Supplier Diameter weight nb.of.pieces Shape Impermeability Finishing Mature.Volume Raw.Material Price Length
Supplier A: 31 Min. :0.4458 Min. :0.610 Min. : 2.000 Shape 1:134 Type 1:172 Hot Printing: 62 Min. : 1000 ABS: 21 Min. : 6.477 Min. : 3.369
Supplier B:150 1st Qu.:0.7785 1st Qu.:1.083 1st Qu.: 3.000 Shape 2: 45 Type 2: 23 Lacquering :133 1st Qu.: 15000 PP :148 1st Qu.:11.807 1st Qu.: 6.161
Supplier C: 14 Median :1.0120 Median :1.400 Median : 4.000 Shape 3: 8 Median : 45000 PS : 26 Median :14.384 Median : 8.086
Mean :1.2843 Mean :1.701 Mean : 4.113 Shape 4: 8 Mean : 96930 Mean :16.444 Mean :10.247
3rd Qu.:1.2886 3rd Qu.:1.704 3rd Qu.: 5.000 3rd Qu.:115000 3rd Qu.:18.902 3rd Qu.:10.340
Max. :5.3950 Max. :7.112 Max. :10.000 Max. :800000 Max. :46.610 Max. :43.359
2) We start with univariate and bivariate descriptive statistics. Using appropriate plot(s) or summaries answer the following questions.
a) How is the distribution of the Price? Comment your plot with respect to the quartiles of the Price.
From the quantile data, the summary statistics are given by: median, 1Q and 3Q as 14.432, 11.864 and 19.04 respectively.
The plots, the kurtosis and the skewness parameters suggest the price follows a bimodal distribution that is “skewed right”. The major mode is around 14 and the antimode is around 29. Furthermore, 50% of the prices in the range 11.864 and 19.04. This is consistent with graph where the majority of the density is concentrated inside this range and a long right tail of prices outside.
The boxplot supports this analyis and suggests the values in the tail are outliers.
price_density <- ggdensity(raw_data,x="Price",y = "..count..",
color="darkblue",
fill="lightblue",size=0.5,
alpha=0.2,
title = "Screw Cap Price Distribution",
linetype = "solid", add = c("median"))+ font("title", size = 12,face="bold")
price_boxplot <- ggboxplot(raw_data$Price, width = 0.1, fill ="lightgray", outlier.colour = "darkblue", outlier.shape=4.2, ylab = "Price", xlab = "Screw Caps" , title = "Price Box Plot") + rotate() + font("title", size = 12,face="bold")
price_quantile <- quantile(raw_data$Price)
ggarrange(price_density, price_boxplot, ncol = 1, nrow = 2)
price_quantile
0% 25% 50% 75% 100%
6.477451 11.807022 14.384413 18.902429 46.610372
skewness(raw_data$Price)
[1] 1.706151
kurtosis(raw_data$Price)
[1] 6.395453
b) Does the Price depend on the Length? weight?
We examine Price vs. Length, log(Price) vs. log(Length); Price vs. weight, log(Price) vs. log(weight) and provide the summary for each.
The plots suggest somewhat of a relationship between the variables, but observing the results of the F and T tests confirm this to a high degree of significance.
price_length <- ggplot(raw_data, aes(x=Length, y=Price)) + geom_point() + geom_smooth(method=lm, color="darkgreen")+ theme_minimal()
price_length_log <- ggplot(raw_data, aes(x=log(Length), y=log(Price))) + geom_point() + geom_smooth(method=lm, color="darkgreen")+ theme_minimal()
price_weight <- ggplot(raw_data, aes(x=weight, y=Price)) + geom_point() + geom_smooth(method=lm,color="red")+theme_minimal()
price_weight_log <- ggplot(raw_data, aes(x=log(weight), y=log(Price))) + geom_point() + geom_smooth(method=lm,color="red")+theme_minimal()
ggarrange(ggarrange(price_length, price_length_log, ncol = 2, nrow = 1), ggarrange(price_weight, price_weight_log, ncol = 2, nrow = 1), ncol = 1, nrow = 2)
summary(lm(formula = Price ~ Length, raw_data))
Call:
lm(formula = Price ~ Length, data = raw_data)
Residuals:
Min 1Q Median 3Q Max
-13.901 -2.854 -0.741 1.931 16.181
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.94613 0.50918 17.57 <2e-16 ***
Length 0.73168 0.03953 18.51 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.308 on 193 degrees of freedom
Multiple R-squared: 0.6397, Adjusted R-squared: 0.6378
F-statistic: 342.6 on 1 and 193 DF, p-value: < 2.2e-16
summary(lm(formula = log(Price) ~ log(Length), raw_data))
Call:
lm(formula = log(Price) ~ log(Length), data = raw_data)
Residuals:
Min 1Q Median 3Q Max
-0.70368 -0.15501 -0.01661 0.15170 0.59211
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.56380 0.07278 21.49 <2e-16 ***
log(Length) 0.53875 0.03282 16.42 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2466 on 193 degrees of freedom
Multiple R-squared: 0.5827, Adjusted R-squared: 0.5805
F-statistic: 269.5 on 1 and 193 DF, p-value: < 2.2e-16
summary(lm(formula = Price ~ weight, raw_data))
Call:
lm(formula = Price ~ weight, data = raw_data)
Residuals:
Min 1Q Median 3Q Max
-14.7993 -2.6207 -0.6631 2.5396 13.8357
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.2275 0.5602 14.69 <2e-16 ***
weight 4.8312 0.2718 17.78 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.419 on 193 degrees of freedom
Multiple R-squared: 0.6208, Adjusted R-squared: 0.6189
F-statistic: 316 on 1 and 193 DF, p-value: < 2.2e-16
summary(lm(formula = log(Price) ~ log(weight), raw_data))
Call:
lm(formula = log(Price) ~ log(weight), data = raw_data)
Residuals:
Min 1Q Median 3Q Max
-0.71123 -0.15340 -0.01343 0.17735 0.69552
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.50618 0.02333 107.42 <2e-16 ***
log(weight) 0.56453 0.03718 15.18 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2577 on 193 degrees of freedom
Multiple R-squared: 0.5443, Adjusted R-squared: 0.5419
F-statistic: 230.5 on 1 and 193 DF, p-value: < 2.2e-16
c) Does the Price depend on the Impermeability? Shape?
The plots below suggests there is dependency on Impermeability - the medians differ significantly.
impermability_plot_1 <- ggdotplot(raw_data,x="Impermeability",y="Price",color = "Impermeability", palette = "jco",binwidth = 1,legend="none")
shape_plot_1 <- ggdotplot(raw_data,x="Shape",y="Price",color = "Shape", palette = "npg",binwidth = 1,legend="none")
impermability_plot_2 <- ggboxplot(raw_data,x="Impermeability",y="Price",color = "Impermeability", palette = "jco",legend="none")
shape_plot_2 <- ggboxplot(raw_data,x="Shape",y="Price",color = "Shape", palette = "npg", legend = "none")
ggarrange(ggarrange(impermability_plot_1,impermability_plot_2,ncol = 2, nrow = 1),
ggarrange(shape_plot_1,shape_plot_2,ncol = 2, nrow = 1),
ncol = 1, nrow = 2)
d) Which is the less expensive Supplier?
The answer to this question depends on the definition of expensive.
First, examine the following absolute metrics (this can be seen via the boxplot) 1) Absolute price - Supplier B cheapest (6.477451). However, Supplier B is also the supplier which has the highest absolute price (46.610372) 2) Average Price - Supplier C cheapest (14.88869)
Second, examine the following relative metrics:
3) Average Price / Unit Length - Supplier A (1.505043) 4) Average Price / Unit weight - Supplier A (9.013902) 5) Average Price / Unit Diameter - Supplier A (11.95632)
The result above suggest Supplier A has the cheapest average price per unit of production.
The analysis however is not complete given we do not have a definition of cheapest price. Even the scatter and box plots below suggest suppliers may cater to specific product ranges. It also ignores the categorical data which could provide some insights into cheapest price for certain product features Furthermore, we have not performed statistical tests to examine the significance of these differences.
supplier_plot_1 <- ggboxplot(raw_data,x="Supplier",y="Price",color = "Supplier", palette = c("darkblue","red","darkgreen"),legend="none") + rotate()
supplier_plot_2 <- ggscatter(raw_data,x="Length",y="Price",color = "Supplier", palette = c("darkblue","red","darkgreen"),xscale= "log10", yscale="log10")
supplier_plot_3 <- ggscatter(raw_data,x="weight",y="Price",color = "Supplier", palette = c("darkblue","red","darkgreen"),xscale= "log10", yscale="log10")
supplier_plot_4 <- ggscatter(raw_data,x="Diameter",y="Price",color = "Supplier", palette = c("darkblue","red","darkgreen"),xscale= "log10", yscale="log10")
supplier_statistics <- raw_data %>% group_by(Supplier) %>% summarise( "Average Price" = mean(Price), "Average Length" = mean(Length),"Average weight" = mean(weight),"Average Diameter" = mean(Diameter), "Average Price / Length" = mean(Price)/mean(Length), "Average Price / weight" = mean(Price)/mean(weight), "Average Price / Diameter" = mean(Price)/mean(Diameter))
supplier_plot_1
supplier_plot_2
supplier_plot_3
supplier_plot_4
head(supplier_statistics)
3) One important point in explanatory data analysis consists in identifying potential outliers. Could you give points which are suspect regarding the Mature.Volume variable? Give the characteristics (other features) of the observations that seem suspsect
There are four data points which seem suspect - they have the same characteristics for Diameter, weight, nb.of.pieces, Impermeability, Finishing, Raw.Material and Mature.Volume. They differ in their supplier, price and length. These suggest some error in collating the data (system error / default data).
Mature.Volume_plot <- gghistogram(raw_data,x="Mature.Volume",y="..count..", color = "darkblue", fill = "lightgrey") + theme_minimal()
Using `bins = 30` by default. Pick better value with the argument `bins`.
Mature.Volume_plot
raw_data %>% filter (Mature.Volume > 6e+05 )
For the rest of the analysis, the 4 data points above are disregarded.
library(dplyr)
raw_data <- raw_data %>% filter (Mature.Volume < 6e+05 )
4) Perform a PCA on the dataset ScrewCap, explain briefly what are the aims of a PCA and how categorical variables are handled?
Principal components analysis (PCA) is a technique for taking high-dimensional data, and using the dependencies between the variables to represent it in a more tractable, lower-dimensional form, without losing too much information - we try capture the essence of high dimentional data in a low dimensional representation. The aim of PCA is to draw conclusions from the linear relationships between variables by detecting the principal dimensions of variability. This may be for compression, denoising, data completion, anomaly detection or for preprocessing before supervised learning (improve performance / regularization to reduce overfitting).
The categorical variables cannot be represented in the same way as the supplementary quantitative variables since it is not possible to calculate the correlation between a categorical variable and the principal components. The categorical variables here are handled as supplemetary variables on a purely illustrative basis - they are not used to calculate the distance between inidividuals. We represent a categorical variable at the barycentre of all the individuals possessing that variable. A categorical variable on the PCA performed below can therefore be regarded as the mean individual obtained from the set of individuals who have it.
Given our ultimate goal here is to explore data prior to a multiple regression, it is advisable to choose the explanatory variables for the regression model as active variables for PCA, and to project the variable to be explained (the dependent variable) as a supplementary variable. This gives some idea of the relationships between explanatory variables and thus of the need to select explanatory variables. This also gives us an idea of the quality of the regression: if the dependent variable is appropriately projected, it will be a well-fitted model. Thus we select Price as a supplementary variable.
The dataset in this exercise contains 6 supplementary variables: - 1 quantitative variable (Price) - 5 qualitative variables (Supplier, Shape, Impermeability and Finishing).
res.pca <- PCA(raw_data,quali.sup = c(1,5,6,7,9),quanti.sup = 10, graph = FALSE, select = c("120"))
Error in PCA(raw_data, quali.sup = c(1, 5, 6, 7, 9), quanti.sup = 10, :
unused argument (select = c("120"))
5) Compute the correlation matrix between the variables and comment it with respect to the correlation circle
The first task is to center and standardize the variables. Then the correlation matrix is computed. All variable vectors are quite near to the boundary of the correlation circle on the variables plot - thus the variables are relatively well projected on the 2 dimensional subspace. We now turn our attention to correlations between variables.
The correlations can be visualised through the angles between variables on the correlation matrix. This can be related to the correlation matrix: - Diameter, Length and weight expose very strong corrleation: the angle between them is close to 0, suggesting correlation close to 1. - The three variables above are at an angle sightly wider than a right angle to both nb.of.pieces and Mature.Volume in the cirlce which suggests slightly negative correlation. - Price is highlighly correlatd to the three variables above - Equally, Mature.Volume and nb.of.pieces are at a slightly wider angle than a right angle which suggests slightly negative correlation.
don <- as.matrix(raw_data[,-c(1,5,6,7,9,10)]) %>% scale()
don_correlation <- cor(don)
don_correlation
Diameter weight nb.of.pieces Mature.Volume Length
Diameter 1.0000000 0.9622544 -0.14869500 -0.29164724 0.9996963
weight 0.9622544 1.0000000 -0.16884367 -0.31321323 0.9627460
nb.of.pieces -0.1486950 -0.1688437 1.00000000 -0.07462463 -0.1463770
Mature.Volume -0.2916472 -0.3132132 -0.07462463 1.00000000 -0.2936330
Length 0.9996963 0.9627460 -0.14637705 -0.29363295 1.0000000
plot.PCA(res.pca,choix = c("var"))+theme_minimal()
NULL
6) On what kind of relationship PCA focuses? Is it a problem?
PCA focuses on the linear relationships between continuous variables. Given complex links also exist, such as quadratic relationships, logarithmics, exponential functions, and so forth, this may seem restrictive, but in practice many relationships can be considered linear, at least for an initial approximation. However, there is obviously non-linear datasets for which PCA will have pitfalls (e.g. spiral dataset). Furthermore, in PCA categorical variables cannot be active variables, which can be restrictive.
7) Comment the PCA outputs
Comment the position of the categories Impermeability=type 2 and Raw.Material=PS.
The coordinates for Type 2 are (3.30430162 , 0.0020023422) for the first two principal components The coordinate for PS are (2.69084507 -0.2539199538) for the first two principal components
Both categories have a high coordinate for the first principal component. Given the correlation circle shows high correlation between the first component and price, diameter, length and weight, this suggest Type 2 and PS have high values for these variables.
res.pca$quali.sup$coord
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
Supplier A 0.54805992 -0.054566515 -0.214051234 0.0227636641 0.0058684306
Supplier B -0.06543165 -0.125589918 -0.026949041 0.0018781980 -0.0006195266
Supplier C -0.44356100 1.440695488 0.728281700 -0.0670085402 -0.0056067533
Shape 1 -0.42564773 -0.137916238 -0.214123559 0.0065253174 -0.0009308749
Shape 2 1.42726960 0.394279456 0.383010989 -0.0314353290 0.0018793388
Shape 3 -0.55969671 -0.332207048 0.059604995 0.1360698967 -0.0029996610
Shape 4 -0.55191919 0.355523978 1.265466019 -0.0652825793 0.0075550964
Type 1 -0.45031131 -0.001823621 -0.009194259 0.0008200692 -0.0005392760
Type 2 3.28923043 0.013320364 0.067158065 -0.0059900708 0.0039390597
Hot Printing -0.28600503 -0.037712714 0.192161713 0.0717729126 -0.0006010224
Lacquering 0.13745978 0.018125491 -0.092356792 -0.0344955084 0.0002888635
ABS 0.87599666 0.220028373 -0.581512708 0.0043591307 -0.0032513149
PP -0.61062316 0.013651457 0.120658551 0.0042947466 -0.0005390562
PS 2.67437709 -0.253323291 -0.198579404 -0.0273071254 0.0056116038
Comment the percentage of inertia
res.pca$eig
eigenvalue percentage of variance cumulative percentage of variance
comp 1 3.1071215080 62.142430160 62.14243
comp 2 1.0669070766 21.338141532 83.48057
comp 3 0.7768681861 15.537363723 99.01794
comp 4 0.0488056018 0.976112036 99.99405
comp 5 0.0002976274 0.005952549 100.00000
dimdesc(res.pca, axes = 1:1)
$Dim.1
$Dim.1$quanti
correlation p.value
Length 0.9853764 3.259183e-147
Diameter 0.9851090 1.784008e-146
weight 0.9774643 1.263294e-129
Price 0.7960132 4.472456e-43
nb.of.pieces -0.2017085 5.139018e-03
Mature.Volume -0.4118157 3.243173e-09
$Dim.1$quali
R2 p.value
Impermeability 0.4767041 2.203784e-28
Raw.Material 0.4309747 9.602186e-24
Shape 0.2024825 3.268025e-09
$Dim.1$category
Estimate p.value
Type 2 1.8697709 2.203784e-28
PS 1.6944602 3.078822e-20
Shape 2 1.4547681 6.874053e-11
ABS -0.1039202 1.566216e-02
Shape 1 -0.3981492 5.692581e-07
PP -1.5905400 1.465743e-20
Type 1 -1.8697709 2.203784e-28
Fisher test Variance
Comments the results and describe precisely one cluster – Add Fisher Test
The cluster 1 is made of individuals sharing : - high values for the variable Mature.Volume. - low values for the variables nb.of.pieces, Price, weight, Length and Diameter (variables are sorted from the weakest).
The cluster 2 is made of individuals sharing : - high values for the variable nb.of.pieces. - low values for the variables Mature.Volume, Diameter, Length, weight and Price (variables are sorted from the weakest).
The cluster 3 is made of individuals such as 89, 90, 131, 161, 163 and 164. This group is characterized by : - high values for the variables Length, Diameter, weight and Price (variables are sorted from the strongest). - low values for the variables nb.of.pieces and Mature.Volume (variables are sorted from the weakest).
If someone ask you why you have selected k components and not k + 1 or k − 1, what is your answer? (could you suggest a strategy to assess the stability of the approach? - are there many differences between the clustering obtained on k components or on the initial data)
res.pca <- PCA(raw_data,quali.sup = c(1,5,6,7,9),quanti.sup = 10,ncp=4)
res.hcpc <- HCPC(res.pca, nb.clust = -1)
res.pca <- PCA(raw_data,quali.sup = c(1,5,6,7,9),quanti.sup = 10,ncp=3)
res.hcpc <- HCPC(res.pca, nb.clust = -1)
res.pca <- PCA(raw_data,quali.sup = c(1,5,6,7,9),quanti.sup = 10,ncp=2)
res.hcpc <- HCPC(res.pca, nb.clust = -1)
res.hcpc <- HCPC(res.pca, nb.clust = -1, graph = FALSE)
Chi-squared approximation may be incorrectChi-squared approximation may be incorrectChi-squared approximation may be incorrectChi-squared approximation may be incorrect
res.hcpc <- HCPC(res.pca, nb.clust = -1, graph = FALSE)
Chi-squared approximation may be incorrectChi-squared approximation may be incorrectChi-squared approximation may be incorrectChi-squared approximation may be incorrect
res.hcpc
**Results for the Hierarchical Clustering on Principal Components**
name description
1 "$data.clust" "dataset with the cluster of the individuals"
2 "$desc.var" "description of the clusters by the variables"
3 "$desc.var$quanti.var" "description of the cluster var. by the continuous var."
4 "$desc.var$quanti" "description of the clusters by the continuous var."
5 "$desc.var$test.chi2" "description of the cluster var. by the categorical var."
6 "$desc.axes$category" "description of the clusters by the categories."
7 "$desc.axes" "description of the clusters by the dimensions"
8 "$desc.axes$quanti.var" "description of the cluster var. by the axes"
9 "$desc.axes$quanti" "description of the clusters by the axes"
10 "$desc.ind" "description of the clusters by the individuals"
11 "$desc.ind$para" "parangons of each clusters"
12 "$desc.ind$dist" "specific individuals"
13 "$call" "summary statistics"
14 "$call$t" "description of the tree"
plot.HCPC(res.hcpc, choice = 'map', draw.tree = FALSE, title = '', select=c("12"))
res.pca <- PCA(raw_data,quali.sup = c(1,5,6,7,9),quanti.sup = 10,ncp=3)
res.hcpc <- HCPC(res.pca, nb.clust = -1, graph = FALSE)
Chi-squared approximation may be incorrectChi-squared approximation may be incorrectChi-squared approximation may be incorrectChi-squared approximation may be incorrect
res.hcpc
**Results for the Hierarchical Clustering on Principal Components**
name description
1 "$data.clust" "dataset with the cluster of the individuals"
2 "$desc.var" "description of the clusters by the variables"
3 "$desc.var$quanti.var" "description of the cluster var. by the continuous var."
4 "$desc.var$quanti" "description of the clusters by the continuous var."
5 "$desc.var$test.chi2" "description of the cluster var. by the categorical var."
6 "$desc.axes$category" "description of the clusters by the categories."
7 "$desc.axes" "description of the clusters by the dimensions"
8 "$desc.axes$quanti.var" "description of the cluster var. by the axes"
9 "$desc.axes$quanti" "description of the clusters by the axes"
10 "$desc.ind" "description of the clusters by the individuals"
11 "$desc.ind$para" "parangons of each clusters"
12 "$desc.ind$dist" "specific individuals"
13 "$call" "summary statistics"
14 "$call$t" "description of the tree"
Characterization of each supplier
catdes(raw_data, num.var=1)
Chi-squared approximation may be incorrectChi-squared approximation may be incorrectChi-squared approximation may be incorrectChi-squared approximation may be incorrect
$test.chi2
p.value df
Raw.Material 9.049049e-05 4
Impermeability 1.088731e-02 2
$category
$category$`Supplier A`
Cla/Mod Mod/Cla Global p.value v.test
Raw.Material=PS 42.30769 37.93103 13.61257 0.0002998155 3.615459
Impermeability=Type 2 34.78261 27.58621 12.04188 0.0130149176 2.483361
Shape=Shape 2 26.66667 41.37931 23.56021 0.0213728107 2.301333
Raw.Material=ABS 0.00000 0.00000 10.99476 0.0254288561 -2.234825
Impermeability=Type 1 12.50000 72.41379 87.95812 0.0130149176 -2.483361
$category$`Supplier B`
Cla/Mod Mod/Cla Global p.value v.test
Raw.Material=ABS 100.00000 14.18919 10.99476 0.003330616 2.935453
Raw.Material=PS 57.69231 10.13514 13.61257 0.015928453 -2.410551
Shape=Shape 2 60.00000 18.24324 23.56021 0.002374481 -3.038894
$category$`Supplier C`
Cla/Mod Mod/Cla Global p.value v.test
Raw.Material=PP 9.722222 100 75.39267 0.01626019 2.403023
$quanti.var
Eta2 P-value
nb.of.pieces 0.2137072 1.530822e-10
$quanti
$quanti$`Supplier A`
NULL
$quanti$`Supplier B`
v.test Mean in category Overall mean sd in category Overall sd p.value
nb.of.pieces -2.817845 3.959459 4.115183 1.240523 1.413225 0.004834708
$quanti$`Supplier C`
v.test Mean in category Overall mean sd in category Overall sd p.value
nb.of.pieces 6.345875 6.428571 4.115183 1.720228 1.413225 2.211654e-10
attr(,"class")
[1] "catdes" "list "
catdes(raw_data, num.var=5)
Chi-squared approximation may be incorrectChi-squared approximation may be incorrectChi-squared approximation may be incorrectChi-squared approximation may be incorrect
$test.chi2
p.value df
Impermeability 2.873602e-16 3
Raw.Material 8.762044e-07 6
Finishing 1.040072e-02 3
$category
$category$`Shape 1`
Cla/Mod Mod/Cla Global p.value v.test
Impermeability=Type 1 76.785714 99.2307692 87.95812 1.043033e-11 6.800436
Raw.Material=PP 72.222222 80.0000000 75.39267 3.596432e-02 2.097331
Finishing=Lacquering 72.868217 72.3076923 67.53927 4.420603e-02 2.012132
Finishing=Hot Printing 58.064516 27.6923077 32.46073 4.420603e-02 -2.012132
Raw.Material=PS 30.769231 6.1538462 13.61257 3.360691e-05 -4.147538
Impermeability=Type 2 4.347826 0.7692308 12.04188 1.043033e-11 -6.800436
$category$`Shape 2`
Cla/Mod Mod/Cla Global p.value v.test
Impermeability=Type 2 95.65217 48.88889 12.04188 2.151940e-15 7.932260
Raw.Material=PS 69.23077 40.00000 13.61257 1.004609e-07 5.325888
Supplier=Supplier A 41.37931 26.66667 15.18325 2.137281e-02 2.301333
Supplier=Supplier B 18.24324 60.00000 77.48691 2.374481e-03 -3.038894
Raw.Material=PP 16.66667 53.33333 75.39267 2.022645e-04 -3.716171
Impermeability=Type 1 13.69048 51.11111 87.95812 2.151940e-15 -7.932260
$category$`Shape 3`
NULL
$category$`Shape 4`
Cla/Mod Mod/Cla Global p.value v.test
Finishing=Hot Printing 9.677419 75 32.46073 0.0169336 2.388146
Finishing=Lacquering 1.550388 25 67.53927 0.0169336 -2.388146
$quanti.var
Eta2 P-value
Price 0.24285191 2.771217e-11
Diameter 0.23221716 9.994081e-11
Length 0.23112294 1.139178e-10
weight 0.19722569 5.965369e-09
nb.of.pieces 0.10533516 1.120672e-04
Mature.Volume 0.05693699 1.177220e-02
$quanti
$quanti$`Shape 1`
v.test Mean in category Overall mean sd in category Overall sd p.value
nb.of.pieces -3.721118 3.853846 4.115183 1.2716527 1.4132247 1.983430e-04
weight -5.069026 1.418671 1.714121 0.7867181 1.1728539 3.998565e-07
Length -5.469752 8.191771 10.329589 5.0759423 7.8647827 4.506649e-08
Diameter -5.489199 1.026728 1.294639 0.6306647 0.9821218 4.037612e-08
Price -6.344431 14.290942 16.552332 4.8726895 7.1724314 2.232495e-10
$quanti$`Shape 2`
v.test Mean in category Overall mean sd in category Overall sd p.value
Diameter 6.616403 2.143782 1.294639 1.409033 9.821218e-01 3.680436e-11
Length 6.603559 17.116281 10.329589 11.260640 7.864783e+00 4.014035e-11
Price 6.176070 22.340911 16.552332 9.620363 7.172431e+00 6.571699e-10
weight 6.118100 2.651800 1.714121 1.698673 1.172854e+00 9.469770e-10
nb.of.pieces 2.865929 4.644444 4.115183 1.607698 1.413225e+00 4.157871e-03
Mature.Volume -2.005008 58355.222222 82206.026178 68318.473831 9.103190e+04 4.496223e-02
$quanti$`Shape 3`
NULL
$quanti$`Shape 4`
v.test Mean in category Overall mean sd in category Overall sd p.value
nb.of.pieces 3.078997 5.62500 4.115183 9.921567e-01 1.413225 0.002076987
Mature.Volume 2.629122 165250.00000 82206.026178 1.132649e+05 91031.901051 0.008560561
Price 2.044271 21.63988 16.552332 3.822103e+00 7.172431 0.040926790
attr(,"class")
[1] "catdes" "list "
catdes(raw_data, num.var=6)
Chi-squared approximation may be incorrectChi-squared approximation may be incorrectChi-squared approximation may be incorrect
$test.chi2
p.value df
Raw.Material 4.088669e-21 2
Shape 2.873602e-16 3
Supplier 1.088731e-02 2
$category
$category$`Type 1`
Cla/Mod Mod/Cla Global p.value v.test
Shape=Shape 1 99.23077 76.785714 68.06283 1.043033e-11 6.800436
Raw.Material=PP 97.91667 83.928571 75.39267 1.773212e-11 6.723573
Supplier=Supplier A 72.41379 12.500000 15.18325 1.301492e-02 -2.483361
Raw.Material=PS 30.76923 4.761905 13.61257 5.429478e-15 -7.816541
Shape=Shape 2 51.11111 13.690476 23.56021 2.151940e-15 -7.932260
$category$`Type 2`
Cla/Mod Mod/Cla Global p.value v.test
Shape=Shape 2 48.8888889 95.652174 23.56021 2.151940e-15 7.932260
Raw.Material=PS 69.2307692 78.260870 13.61257 5.429478e-15 7.816541
Supplier=Supplier A 27.5862069 34.782609 15.18325 1.301492e-02 2.483361
Raw.Material=PP 2.0833333 13.043478 75.39267 1.773212e-11 -6.723573
Shape=Shape 1 0.7692308 4.347826 68.06283 1.043033e-11 -6.800436
$quanti.var
Eta2 P-value
Diameter 0.47062626 6.604215e-28
Length 0.46804072 1.049429e-27
weight 0.45675032 7.728264e-27
Price 0.43301606 4.512224e-25
Mature.Volume 0.07171395 1.801495e-04
$quanti
$quanti$`Type 1`
v.test Mean in category Overall mean sd in category Overall sd p.value
Mature.Volume 3.691294 91225.988095 82206.026178 9.338486e+04 9.103190e+04 2.231162e-04
Price -9.070449 14.805996 16.552332 4.819967e+00 7.172431e+00 1.185272e-19
weight -9.315716 1.420835 1.714121 6.707159e-01 1.172854e+00 1.211330e-20
Length -9.430150 8.338742 10.329589 4.357114e+00 7.864783e+00 4.095012e-21
Diameter -9.456161 1.045344 1.294639 5.411724e-01 9.821218e-01 3.194554e-21
$quanti$`Type 2`
v.test Mean in category Overall mean sd in category Overall sd p.value
Diameter 9.456161 3.115573 1.294639 1.449522 9.821218e-01 3.194554e-21
Length 9.430150 24.871426 10.329589 11.600832 7.864783e+00 4.095012e-21
weight 9.315716 3.856391 1.714121 1.708742 1.172854e+00 1.211330e-20
Price 9.070449 29.308174 16.552332 8.516118 7.172431e+00 1.185272e-19
Mature.Volume -3.691294 16321.086957 82206.026178 13496.587327 9.103190e+04 2.231162e-04
attr(,"class")
[1] "catdes" "list "
res.famd_learning_2 <- FAMD (learning_set_clust, ncp = 6, graph = TRUE, sup.var = c(1,5,7,10), axes = c(1,2), row.w = NULL, tab.comp = NULL, select = c("120"))
Error in FAMD(learning_set_clust, ncp = 6, graph = TRUE, sup.var = c(1, :
unused argument (select = c("120"))
res.hcpc.famd$call$X$Dim.1
[1] -1.98063919 -1.83264787 -1.83106329 -1.78617733 -1.77810618 -1.76793474 -1.76710263 -1.75740608 -1.74971627 -1.74854492 -1.72234966 -1.66587800 -1.64757197 -1.64122934
[15] -1.63598508 -1.56390795 -1.53944334 -1.53334618 -1.48308272 -1.38978669 -1.31431198 -1.31360993 -1.30659672 -1.30106267 -1.24853337 -1.22972350 -1.22095845 -1.21456405
[29] -1.18418458 -1.18021014 -1.17935581 -1.17927296 -1.17545877 -1.16352170 -1.15631274 -1.15536248 -1.14893262 -1.14368481 -1.13904799 -1.11948077 -1.11592461 -1.11166467
[43] -1.10864477 -1.07786759 -1.07465242 -1.07231358 -1.05375226 -1.03073238 -1.01177639 -1.00275909 -1.00244847 -0.99953402 -0.99685597 -0.99557798 -0.99270885 -0.98055457
[57] -0.98041153 -0.97369797 -0.97319443 -0.96309915 -0.94797090 -0.93951350 -0.92381405 -0.92380588 -0.92312060 -0.92280368 -0.92215711 -0.91733190 -0.91127725 -0.90468216
[71] -0.88924490 -0.88886587 -0.88556501 -0.87235273 -0.86644337 -0.86479528 -0.83275001 -0.81147499 -0.77426760 -0.77398705 -0.77256095 -0.76784626 -0.76735493 -0.76500699
[85] -0.75906301 -0.74956767 -0.74385898 -0.72805741 -0.71957456 -0.70612214 -0.67838613 -0.67821432 -0.64991811 -0.64920144 -0.64269013 -0.63822303 -0.63582084 -0.63163680
[99] -0.62753059 -0.62599476 -0.62186560 -0.60480936 -0.58138172 -0.57247016 -0.56818608 -0.55787880 -0.55368923 -0.54139477 -0.53854440 -0.53480620 -0.52898642 -0.52824415
[113] -0.52616669 -0.52564460 -0.52454183 -0.52449868 -0.51962886 -0.51480527 -0.50739799 -0.49352372 -0.48049116 -0.47854325 -0.47511958 -0.45213593 -0.44366084 -0.41417752
[127] -0.40516410 -0.39864924 -0.39340526 -0.38440624 -0.38336093 -0.38001794 -0.37853964 -0.35220981 -0.34431429 -0.32536820 -0.31982441 -0.31803255 -0.30039211 -0.28421095
[141] -0.28417967 -0.27830495 -0.27577096 -0.26247151 -0.25453002 -0.24784546 -0.23849067 -0.23173017 -0.23034051 -0.22769867 -0.15845660 -0.05646856 -0.05377382 0.05889003
[155] 0.07158955 0.07913132 0.16276548 0.79185578 1.06680161 1.09360971 1.14437872 1.21748198 1.25067158 1.37109701 2.03126674 2.20958460 2.25060435 2.48114155
[169] 2.49540111 2.51611986 2.68062272 3.05674906 3.07969012 3.23130366 3.77350671 3.80195508 4.13207845 4.32016658 4.35027614 4.72999750 5.26696337 5.37357116
[183] 5.86802720 5.87862818 6.58628131 6.60222800 6.60582995 7.20144721 7.28664755 7.39353843 7.91577973